Structure-Based Web Pages Clustering

نویسنده

Leila Shahmoradi

چکیده

Recognizing similarities among the documents of a set is one of the objectives of retrieving information. The information related to the similarities of web pages can be used to present similar documents to users in order to retrieve considered information. In the present study, a new algorithm has been proposed to cluster web pages based on their structure. The proposed algorithm is based on hierarchical clustering designed based on SCAN algorithm. Cross-page link and in-page link structures have been tested to create a new similarity function and clustering algorithm based on volume criteria has been created for web pages in order to determine hierarchical relations. As the research findings reveal, the proposed method is effective to be used in uncovered structures of web pages in some organizations` websites.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

Finding Community Base on Web Graph Clustering

Search Pointers organize the main part of the application on the Internet. However, because of Information management hardware, high volume of data and word similarities in different fields the most answers to the user s’ questions aren`t correct. So the web graph clustering and cluster placement in corresponding answers helps user to achieve his or her intended results. Community (web communit...

متن کامل

Using Web Graph Structure for Person Name Disambiguation

In the third edition of WePS campaign we have undertaken the person name disambiguation problem referred to as a clustering task. Our aim was to make use of intrinsic link relationships among Web pages for name resolution in Web search results. To date, link structure has not been used for this purpose. However, Web graph can be a rich source of information about latent semantic similarity betw...

متن کامل

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Link-Based Clustering for Finding Subrelevant Web Pages

We propose a new Web page clustering. Typical search engines only provide relevant pages, i.e., the pages matching users’ needs. However, we design our clustering method to provide non-relevant pages as search results when they refer to relevant pages and help users anticipate the contents of those relevant pages. We call such pages subrelevant. As it is difficult to improve Web search performa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Structure-Based Web Pages Clustering

نویسنده

چکیده

منابع مشابه

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

Finding Community Base on Web Graph Clustering

Using Web Graph Structure for Person Name Disambiguation

A New Hybrid Method for Web Pages Ranking in Search Engines

Link-Based Clustering for Finding Subrelevant Web Pages

عنوان ژورنال:

اشتراک گذاری